a proposed unicode-based extended romanization system for persian texts

نویسندگان

m. a. mahdavi ph.d. imam khomeini international university

چکیده

so far, various romanization schemes have been proposed for capturing persian text using latin alphabet. however, each have served a very specific and yet limited function. this paper proposes an extended romanization scheme that can facilitate a wide range of encoding needed in the field of natural language processing. the proposed scheme endeavors to preserve both orthographic and phonological phenomena in the language. it also accounts for encoding handwritten manuscripts, in which glyph ambiguity is a salient feature. it is particularly relevant to romanizing the kufi script, in which diacritical marks are omitted. the current work also recommends orthographic rules in an effort to standardize future romanization tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Revised Unicode based Sorting Algorithm for Bengali Texts

This paper describes a sorting algorithm for Bengali texts which is one of the most vital tasks for Bengali Natural Language Processing. As Unicode is much more preferable than ASCII encoding, we need to use this representation for Bengali Language. But due to some distinct properties of Bengali Language, they cannot be sorted directly using the order in Unicode character scheme. A few works ha...

متن کامل

A Plagiarism Detection Approach Based on SVM for Persian Texts

Plagiarism is defined as an unauthorized act of using or adapting others’ works and ideas without referring to them. Numerous methods have been proposed to detect plagiarism in different languages; however, not a lot has been accomplished in Persian. The present study has utilized statistical and semantic features to determine the functionality of Support Vector Machines (SVMs) in detecting act...

متن کامل

Proposed Update Unicode Technical Report

Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This is especially important as more and more products are internationalized. This document describes some of the security considerations that programmers, system analysts, standards developers, and user...

متن کامل

Rumi Numeral System Symbols, Additional characters proposed to Unicode

A special numeral system rumi has been in use in North Africa since the Xe century. It remained in use until the XVIIe century. This system has been especially used in the administration of the city of Fez in Morocco. It has also been used in Al-Andalusians, Spain, starting from the XIIe century. The forms of the digits are quiet di erent from the Arabic or the Arabic-Indic digits in use today....

متن کامل

developing a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”

هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...

15 صفحه اول

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
international journal of information science and management

جلد ۱۰، شماره ۱، صفحات ۵۷-۷۱

کلمات کلیدی

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023